Named Entity Recognition in Broadcast News Using Similar Written Texts
نویسندگان
چکیده
We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we detect block alignments between highly similar blocks of the speech data and corresponding written news data that are easily obtainable from the Web, (2) we employ term expansion techniques commonly used in information retrieval to recover named entities that were initially missed by the speech transcriber. We show that our method is able to find the named entities missing in the transcribed speech data, but also to correct incorrectly assigned named entity tags. Consequently, our novel approach improves state-of-the-art results of NER from speech data both in terms of recall and precision.
منابع مشابه
The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011
The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was t...
متن کاملNamed Entity Recognition in Chinese News Comments on the Web
News comment is a new text genre in the Web 2.0 era. Many people often write comments to express their opinions about recent news events or topics after they read news articles. Because news comments are freely written without checking, they are very different from formal news texts. In particular, named entities in news comments are usually composed of some wrongly written words, informal abbr...
متن کاملNamed Entity Recognition on Transcribed Broadcast News Guidelines for Participants
In the Named Entity Recognition (NER) task, systems are required to recognize different types of Named Entities (NEs) in Italian texts. As in the previous editions of EVALITA, we distinguish four NE types: Person (PER), Organization (ORG), Location (LOC) and Geo-Political Entities (GPE). Participant systems should identify both the correct extension and type of each NE. The output of participan...
متن کاملAn IR-Inspired Approach to Recovering Named Entity Tags in Broadcast News
We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we automatically detect document alignments between highly similar speech documents and corresponding written news stories that are easily obtainable from the Web; (2) we employ term expansion techniques commonly used in information retrieval to recove...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کامل